-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove maybeHead from BackwardSyncContext #5497
Conversation
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
It's usage is not clear and it's causing besu to rewind the head to it (a large reorg) when combined with nimbus backwards sync Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
|
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
maybeHead = Optional.empty(); | ||
} else { | ||
maybeHead = Optional.of(headHash); | ||
public synchronized void maybeUpdateTargetHeight(final Hash headHash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename method to reflect what is still does after removing maybeHead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not wrong this is redundant, and the method can be removed, since the update of the target height is done in the syncBackwardsUntil
method, that is called just after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will consider removing this and possiblyMoveHead
in a subsequent PR. I want to study the code a bit more and don't want to invalidate this PR's testing.
if (blockchain.contains(head)) { | ||
LOG.debug("Changing head to {}", head); | ||
blockchain.rewindToBlock(head); | ||
if (blockchain.getChainHead().getHash().equals(lastSavedBlock.getHash())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of maybeHead, I am checking the passed in hash (which should originate from the FCU). I'm not sure if this check is useful at all now though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also removed the blockchain.contains(hash) check. Any reason for that? Could possiblyMoveHead be called before we save the block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously it was only acting on maybeHead and that is now removed. There wasn't an equivalent check for lastSavedBlock.
possiblyMoveHead is currently only called immediately after appending the block to the chain
Lines 314 to 317 in 688d2dd
this.getProtocolContext() | |
.getBlockchain() | |
.appendBlock(block, optResult.getYield().get().getReceipts()); | |
possiblyMoveHead(block); |
I'm wondering whether to remove possiblyMoveHead entirely now as I'm not sure we would ever rewind to the last saved block (should always match this if condition I think)...
Would be good to get a second opinion from @fab-10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code possiblyMoveHead
could be removed, but I am still if thinking if this method was introduced to manage cases when there was a reorg, and still do not have a full answer, anyway the amount of tests that you have done make me confident
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if (blockchain.contains(head)) { | ||
LOG.debug("Changing head to {}", head); | ||
blockchain.rewindToBlock(head); | ||
if (blockchain.getChainHead().getHash().equals(lastSavedBlock.getHash())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also removed the blockchain.contains(hash) check. Any reason for that? Could possiblyMoveHead be called before we save the block?
@@ -128,6 +138,13 @@ protected Optional<List<Block>> processResponse( | |||
// Clear processed headers | |||
headers.clear(); | |||
} | |||
LOG.atDebug() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be too noisy for debug level. Does this create a lot of logs? Might be better at the trace level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In backwards sync context, it's a single line per batch (max 200 blocks). I think single line per BWS batch is reasonable for debug, maybe listing 200 hashes in one line is a bit much though.
I have not considering if this task is used outside of backwards sync context, so will rethink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example:
{"@timestamp":"2023-05-19T07:44:58,628","level":"DEBUG","thread":"nioEventLoopGroup-3-5","class":"GetBodiesFromPeerTask","message":"Associated 115 bodies with 200 headers to get 115 blocks with these hashes: [0xa581c89e8006f754a2cdbb4e3865fd59ac9d6130ab21a16d5d0e8f706e21bc30, 0x1f131c3e95d9b4f78a917109901f3e3ecbbce9520fe2cfbf1733deee42d7e8e5, 0x91ed8c0f5115ec63ba073e8bf11e5856cc7d3d9db0a8105bf7aa760349b768ad, 0x8c5619c66be3dab809698e84de8da101ad8cf477504cef2fd3b72f24e6d6f54a, 0xbf332140d0951dd5d01795813499280c0dadfbf3552571b5a7d2a6cedbd1a5db, 0xa50935a301403d2aaf851c5d188716b1282efa2f1c79bb2e183fe3374ab7307c, 0x1a1dc2de4d10c1ad4ba704d66ebd998cd48992b5be9ccc95baa3a2b4fa5fe678, 0xd8e837589d740eb6c64e04aebc8035304d469301701f6bcf5d764878e37b9c1d, 0x6dfa35f17466743faa9ea6aa92745f79184cffaf4e364a659e1664223b2c4987, 0x1f6236f7529d303fd22a9e52df56862ef2772d7d4f107001c3636f0bc9b7fb91, 0xd304b678eea0f3867f211ced3303fcf5d6b895f404b7be53a182f81f4f612829, 0xdb89afdb0c1a10ac42a89c431a1441f2005b848b3e5881e43f034a4eda7a5257, 0x00854bddf02b950ca668c6ba7c305824c29176e62230b1a9754d431e33907a39, 0xe486b9bbfeaae405804d9f9a2cd8d70c0ceae4051b693ca1f7d202a8ad12458c, 0x5bdba83ee27fe3764fba8038d8e166ddf7b4346ce281b876431fd6a07ee947a1, 0x4e1648a532269400ece24d8ac289647310c54c2739baecbd1803406453ee67e1, 0xe21d2d355af70cc7654e00c37756fea5d5072267c19f153e0f850b388f63176d, 0x750fcb1d22697c636cdbaaf3dbb03d48347e663cea2457d2fdc9296442613c3b, 0x3b441965e617373e6f1f36885a22e8ca4a94d54c668ef996e86d7d3d98454355, 0x09cfa35441f7d2bcee08c5a6ab3f90011c54a8441272e5d65625c0c197cf0085, 0x7c81e63d1519cdd7b2211b70d78392c05a9a70dae10d5a48a91bf5cdeeaea96e, 0xb016d66512399c08933216673b2f33e3de0ffd739533ed05f698f455aea10b09, 0xd7b20a6ec1a16b46960e7ebefdf98d6a60ea09ade1cf5b1235d4f8a0bda300ce, 0x68e16fd9663d2d4c11c96f32b8d40d09016ed653b6838ae6347405156f9a8733, 0x1c3805a1592c5a6986636aae1e3b22f8554f83d75bee33a05a62c8eeeb55c408, 0x1cd4b9b7a96fa0c827beb5eeb0b67af3af0c39991be0492b21791c73b1977f89, 0x15d198b06bbfc017a1e743b475c8a978884b71e0149749bacd673b09155485d7, 0xd5305d13eb18c68a9b221986f418d55a7759dd339aa3c87d631fbb25d78299e4, 0x402028f66ac65d87404e74a17031c6cd16af78406704ca2870112331f1f318b8, 0x5fbf8bec2d8f8ecf27efe0bb9d95ef6c13af4fd5aa2599a53afd17bf5c7bc8c8, 0xe75a096d86fd1b3419472c55aa053ef231316f69e4d74d7462541e951f9f77dc, 0x60adc414f638dc63177f57dac2478c02e6413f2bc1dfc80be40f53f057372c5a, 0x25f4d56ecd836db24e4b7809673313ac1c35c22509b9032e26d16546a44374a1, 0x77ffd9b13e832abb38b12a5a4d91c715a243c3ee49ddeab603c50456d491f280, 0xe844dcae98df79029eb64c7581553b584944c6eecadb0d1fac2e5f4cb1c25971, 0xb89a4edbbe17c318cfb09e681e364d7e28a0c7309ace51310b30d148b4b1aa67, 0xa76d8127f1d803e5cbdc6b2c92c8f564b2b9ed54cd8db2595d8f647cf6a78ea3, 0x5c6a6aede955e3473c54c5a1c28d649c0d7452099cb3413cf2d0d220f73c6a2c, 0xffea7a23b3c43fb2b489a9e87f2d3c2457ad1d7b22e6ebdded015c3b6ff4fe9e, 0x85d900c9a2c386dbc23810a28168b2ddfa5c55138f2c59f042fe10f762cfe6ed, 0xa13394b891d86bfda335ac284208cf9d95e700870563a939cbeb197106141104, 0xe36d327166b3e877ea7e936d4b5fb28a2fd8d2619c781203b31394e50162a312, 0xaaf171611461dcdbccf46a1b24a0f4854c56742927fc14cd2ea74adf159ea69d, 0x5ce013fc908759386f605974f6bfcdd31d4023052ec11c465d2ce1803333c11d, 0x5b080b6ef641d2d669fe515de59637a95a752c38a49be4019e5fb3acc2a02188, 0x31133ff76cb75f2d2685ecea1e5a9a247261101986d9bd34550139788337fed5, 0x663e46591e5b011aab0a034c30fb52cd28828cf53b8b98d825c485743df4c5f0, 0x7146a32c598f451e531ba4dcaea977af78a5a49643c2da6f20317598b51f34e7, 0x9155e93ab1f9a8d28a5ab747afc89e7185d5e62deb7eb31dc44f24e90d18c4ae, 0x4167e1aa38583afb1294d20c42988d5971a950a4472eb7c1dcb64ddf79c4b0db, 0xed014895537f9f21c51aea4727c17b9199797c29bb8222e9e449c265ed75832e, 0xd40f4c5205c94e31dddfbba205a0a7d4d80956f6772b294b02ff21dca76e0679, 0x96e37c2045ad9a17bd343ceccd807d81a5f65effd9c00ed15fb282228d42d52f, 0x95d05790173515011123a806c786ce1dda2cc307ed1b499f40ccb7ac3762e3d6, 0x43708dc664bf0ddc241ab61ccf028363b537be6fcec9224c9de4f7339092c875, 0x15e73586d7868a0bd296be832040dc761d83eea8ec4bf9b90fe8d9770c3f6531, 0xc174e48efe4f2d739a5be2d0092789c9d4afe36d64d84bc17abd811c36c91064, 0xcff0cb3ebfb4ca51abacf748c6af951e1bf8c7ac3ee426292bc9467a37b479df, 0xf05e9111f026a97ca88c96d82c5ba7e167305b9c2e42ef8dbe9a55974e3a35c3, 0x6c3e8853383f246d1479e3b0111bcff0235d0dbfffe384b177cb55c7097d256c, 0xbed8c70c362da01efdcea7833d17f18f5bb520c6f30aea543e997f467289dd12, 0xfdb746d5604b2521d3521459e4dae6ace8cc5c5dfd3dea95e797f2117ef2a0a4, 0x131d1da9cd6474942c401bf5bdc1f7e1926b064ad5d671424078610edb2c0f59, 0x1164dce33cf33761294089993fcee9f2077d56d2894f75ecb74e3de577a88279, 0x09a3ef620a4215da7deed28d62fe2333a63833fcf63b4c01b46b5f03990345f7, 0x3a5c17f68b0c4069f1f3fa2363e28b774a8b4881cf9468274f063628babcc884, 0xdd914af3c646a53c7932237513165ee2e0a4d6b2ff0b5ccb3ac4f77ddb2cf93e, 0x8e1e0ae61cc95fedb0b6c2e97d12c92760a896f6f7f9147f6d96a45c301a61fb, 0x5b48aa0aff118a47b640c7f7620f422a4d011a9d1acbbbd960ad50e6ad214738, 0x7c774a7466bda226acc4c602b8c37c261569ac53c6d048bac52e35034622601c, 0xc8b4fc8395a0c7e7df16412dbff0f1c564e354ba97ca18dc40267bfd5afa62a8, 0xb098b4a327601d9778a709ec2f4a0a4ded8fef89c0ad3993bad7165c6d064583, 0xabf2c2df39eefcdb744bd004efb1fa9e94b154e67071b6f331928290450a3420, 0xa2a4109dfe9d8dc3416b60e432a68240d98749ceab5e5ab52840e943070372aa, 0x4b8515f540f871293a0da856ce88708d8ab62d9eff15da7a8a54b50393cc3ac3, 0x4872be7ea3e00c2471b39b89b5a629d7af21ea42aacf286e4f30e5c5b46e6914, 0xa1082cb18ed1c8e087577aadb537f9e4ebf762fc748c0ad08799172720bc8717, 0x8956971daab3cf9d936863f59bf5bcacaddd4f8588cba9344722bf0c20fb7695, 0xc7e1719c24402b5d2ffd1929df0cf0ff5e1e2a1697ed4a37d8be5db202e2485a, 0x19c9350a054dcbb88fe8738591f4504ac793fcfb0133b22d9c84a91773cd3f49, 0x625babcccedb59b9fc7c0742d4668a4357593320dbf93c791ff8e3ac6bb1a3de, 0x0c90873baf7975256e49f3112f27186dbf3691736200fc368daab3f5bb2ac709, 0x137616ca5b1fc2e5308083d64b19e95efe404127ff07fd392e906d4e412c141a, 0xf8e60dd6f80b3c48f76f53e7482ee69c1c7132f6bff125deba1fd28ca6c8fb76, 0x5c95c4d427d99486bae531a2659a75cc84185e8d30d1ce593a5aa55dc66935f5, 0x410867343202a76c0313c6e0f1cdbf831cc2b77af77f8fda26ec917483769efe, 0xb10480862f567485fb5fd7e5ef33b1cc2977d7f0d90302930297578336df8940, 0x3c1a03f9ed9771a9e14ec683cb7aa683065621d62ef33b62f9772c645bee0ad6, 0x8d4d768779446a6c49de849b61bd40237abf080c6834cbdc2635ccc0f05dd6d6, 0x69b82a64dda8732646b72b451323123c9af4eee39a996cfa72cfb8af1aa2c7a3, 0x8cf25c7d806456e82fe2ebc816ee4818cc3e1405857f3f480d1661f0c2d76a51, 0xdfd598d458f90107582091316156db07bd525a4980cfaf32a49f72ecd0359305, 0x68fb2a0b022831c5940b22fbfe31f77e940b3ce3e8d4b36feee7fb87e18593d4, 0xd833eeb9bcdadbfb6d3484681c97a531087f00d827c7face401a4b0f72e18bb9, 0x2fefd045e3d83d392256a5eb78b24501ed199fe0e2f36f0bf4f933da53ac3aa5, 0xc838b833310a5f53cb50b4edc9c1105fd95cd59bbfc9162ec59f59ef89d9a0cd, 0x34abafca91f5c32b3fed2cf19efc90f54265de7c4561a32cd3aaaae5e3ef3aef, 0x74cf886674e59f30a9ed62c2fb955f441e17e5c320d8ab877d20723c6ddd2c83, 0xe904cbb38089a91928d5c67207ec1d5eef2120a25450a7697f7dc144f69b1aba, 0x02c0b59b46da014a63d2327fc118c23cb5f2f2314c5355cb042341c5422a8e46, 0xd90685f7e17f49cf00c903d0af42775355fba320260ed000756311faf5e57686, 0xad0745a0d8f3e57cddd0ff3cd80a85862a71adee7f0c02e3775bc2676eb09926, 0x318055a0d35e21b1a0cef131b3389126bf794aec93ce45aa9cd275f82ca4bf99, 0xdb417ec856807c1d0b046e561b9469dd1cb2fa829a2df4ad2b44f244f5b4e304, 0x40e187eae307e95920b10d06f62cd1438dcd119ee1ced90633f119ec9e71cd65, 0xdfe1574cf4286282ef3090be6b6ab7520193bdcdf9f530b6fe972be10964978d, 0xbf6f6b77735bd56a47d468fff63664a6df55b68ea6febd2346dbaebefa1bd346, 0x9d16408cc2d98f710c1864e2b5d3c552d7cfdf0e3c1747160351092b4aa6fbd9, 0xd74f918fc549024a362e5770653cf4f383cb7ce951f0fb494938e5daeb404fee, 0xc6b8e328025517420a52a90fd7364e75d8e6b3d2982e3bd19c6163a5bf19d9e0, 0x0186fd9dc402914d031e01b272e552305bd4a9311b444ec4ede2a556269c21c2, 0xa5f6fe9d2e949de3348158b2716320247c0397ec78717a983de271806cf642d2, 0xdc8c1ed57e2287af318af110f305c1effbc534754c405252ac6cc30c84aaa291, 0x6437ac9c5325d524a65d0acfddb4588b78d936950a7e687d1c33c1fd9895fa8d, 0x0854cd3923ef76ef8d63ad44903f04d51f2c58a4800be6fec53a0a19be98e13e]","throwable":""}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I agree it's a bit much for DEBUG, I'll prob move to trace.
Another option could be to print first and last header of batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved to trace for now. This also made me realise I had left the logging of the bodies in there which is pretty useless without the headers (which is why I created this debug log afterwards)
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
not sure if it's a really important scenario but I don't see
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some minor log opts
.../eth/src/main/java/org/hyperledger/besu/ethereum/eth/manager/task/GetBodiesFromPeerTask.java
Outdated
Show resolved
Hide resolved
...h/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncContext.java
Outdated
Show resolved
Hide resolved
...h/src/main/java/org/hyperledger/besu/ethereum/eth/sync/backwardsync/BackwardSyncContext.java
Outdated
Show resolved
Hide resolved
maybeHead = Optional.empty(); | ||
} else { | ||
maybeHead = Optional.of(headHash); | ||
public synchronized void maybeUpdateTargetHeight(final Hash headHash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not wrong this is redundant, and the method can be removed, since the update of the target height is done in the syncBackwardsUntil
method, that is called just after
if (blockchain.contains(head)) { | ||
LOG.debug("Changing head to {}", head); | ||
blockchain.rewindToBlock(head); | ||
if (blockchain.getChainHead().getHash().equals(lastSavedBlock.getHash())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code possiblyMoveHead
could be removed, but I am still if thinking if this method was introduced to manage cases when there was a reorg, and still do not have a full answer, anyway the amount of tests that you have done make me confident
…anager/task/GetBodiesFromPeerTask.java lazy log Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net> Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
…ync/backwardsync/BackwardSyncContext.java Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net> Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
…ync/backwardsync/BackwardSyncContext.java Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net> Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Running these tests now with nimbus. I believe behaviour will be the same as if it's CL-only downtime since besu just sits there waiting to be driven by CL (and ocassionally serving peers) |
@matkt Nimbus-besu worked for these tests. For the short downtime, I did actually get a |
Signed-off-by: Simon Dudley <simon.dudley@consensys.net>
Its usage is not clear and it's causing besu to rewind the head to it (a large reorg) when combined with nimbus backwards sync. Add more backwards sync logging. Signed-off-by: Simon Dudley <simon.dudley@consensys.net> Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net> Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
Its usage is not clear and it's causing besu to rewind the head to it (a large reorg) when combined with nimbus backwards sync. Add more backwards sync logging. Signed-off-by: Simon Dudley <simon.dudley@consensys.net> Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net> Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
This reverts commit 9be3ca9.
Its usage is not clear and it's causing besu to rewind the head to it (a large reorg) when combined with nimbus backwards sync. Add more backwards sync logging. Signed-off-by: Simon Dudley <simon.dudley@consensys.net> Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net> Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
Its usage is not clear and it's causing besu to rewind the head to it (a large reorg) when combined with nimbus backwards sync. Add more backwards sync logging. Signed-off-by: Simon Dudley <simon.dudley@consensys.net> Co-authored-by: Fabio Di Fabio <fabio.difabio@consensys.net> Co-authored-by: Sally MacFarlane <macfarla.github@gmail.com>
Summary of Changes
maybeHead
. When backwards sync is performed, this can cause besu to incorrectly jump back to a much older block instead of a recent block.Context
This is a partial fix for issues noticed on recent nimbus-besu combos: #5411
Note, there are still issues due to nimbus behaviour, so there is instability when restarting Nimbus for a period of about an hour where it gets in sync but falls out again repeatedly.
It should also help with any CL that causes a backwards sync in Besu.
It should prevent the issue where we see large "reorgs" in the logs e.g.:
because instead of rewinding back to an incorrectly maintained
maybeHead
hash, it will rewind back to the hash that was given to it in the FCU or newPayload.Note, we still see these reorgs when Prysm lock-steps syncs in batches of ~50 newPayloads
Testing
This goal of this testing was to ensure the node could get back into sync in each scenario (and also explore the various CL syncing strategies but that's not relevant for this PR). Testing was done on Sepolia, unless otherwise mentioned
Impact
Scenario 1 with Nimbus (the main driver for this fix) before and after the fix:
BEFORE
AFTER
After being restarted, Nimbus on Sepolia takes some time to find peers, so it is maybe getting more towards Scenario 2. No lock-step is attempted, instead Nimbus causes a few small backward syncs with some halts in between.
The main difference with before is that besu is only backward syncing a few blocks rather than many hundreds due to the Besu bug.